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ABSTRACT 

A study was undertaken to explore cost-effective ways 
of making career ladder teacher evaluation system decisions based on 
fewer measures, assessing the relationship of observational variables 
to other data and final decisions, and comparison of compensatory and 
conjunctive decision models. Data included multiple scores from eight 
data sources in the 1985-86 Tennessee Career Ladder Teacher 
Evaluation System. The data sources include: (1) classroom 
observation; (2) teacher/evaluator dialogues in planning, teaching 
strategies, and evaluation; (3) a peer questionnaire; (4) a principal 
questionnaire; (5) student questionnaires (primary, elementary, and 
secondary levels); (6) the Tennes::ee Career Ladder Professional 
Skills Test; (7) Professional Development and Leadership Activities 
Summary; and (8) evaluator consensus judgments. Findings indicate 
that less data can be gathered without having any major impact on the 
decisions reached if one uses optimal weighting. Several fairly 
accurate models usi^g various reduced sets of data were proposed. 
Although classroom observation data were not highly related to the 
other variables, there was evidence indicating that the decisions 
reached without such expensive-to-gather data would be highly similar 
to the decisions actually reached. It was not possible to compare a 
purely compensatory model with a purely conjunctive model using these 
data. However, the comparison of the actual decisions reached in 
Tennessee with those made using a conjunctive data combination model 
gave no support for preferring a compensatory model. (TJH) 
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FISCAL VIABILITY, CONJUNCTIVE AND COMPENSATORY MODELS, AND * CAREER- LADDER 
DECISIONS AN EMPIRICAL INVESTIGATION 

Abstract 

The purpose(s) of the study included: (1) exploration of cost effective 
ways of making career ladder teacher evaluation system decisions based on 
fewer measures; (2) assessment of the relationship of observational variables 
to other data gathered and the final decisions; and (3) comparison of 
compensatory and conjunctive decision models. Data included multiple scores 
from eight data sources in the 1985-86 Tennessee Career Ladder Teacher 
Evaluation System. 

The findings from this study suggested that less data can be gathered 
without having any major impact on the decisions reached if one uses optimal 
weighting. Several fairly accurate models using various reduced sets of data 
were proposed. Although classroom observation data were not highly related to 
the other variables there was evidence indicating the decisions reached 
without such expensive -to -gather data would be highly similar to the 
decisions actually reached. 

It was not possible to compare a purely compensatory model with a purely 
conjunctive model using these data. However, the comparison of the actual 
decisions reached in Tennessee with those made using a conjunctive data 
combination model gave no support for preferring a compensatory model. 
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FISCAL VIABILITY, CONJUNCTIVE AND COMPENSATORY MODELS, AND CAREER LADDER 
DECISIONS: AN EMPIRICAL INVESTIGATION 

High stakes decisions should ^e based on high quality data. In general, 
the more data that are gathered, the better the dec is ion -making process. 
However, in gathering data for any decision one must consider the cost of 
gathering the data relative to the improvement in the decision made. One of 
the major feasibility standards listed in the Standards for Evaluation of 
Educational Personnel is fiscal viability (Stufflebeam & Brethower, 1987). 
Gathering more data will not always be cost effective. 

Moreover, when more than one piece of data is gathered one must decide 
how to weight the various pieces of data. The psychometric literature 
discusses two prominent models for combining data: the compensatory model and 
the conjunctive model. Also, a combination of the two can be used. In the 
compensatory model, high scores on one variable can compensate for low scores 
on another variable. Multiple regression is an example of this approach. In 
the conjunctive model an individual must score above the cutoff score on each 
measure used. An example of a combination model would be to allow 
compensation above the individual cutoff scores where a conbined score higher 
than the sum of the individual cutoff scores is roquired (see Mehrens , 
Phillips, Anderson, 1987). 

When criterion data exist for a comparable groap, the weights can be 
established such that they result in the lowest number of incorrect decisions 
for that group. Then, through validity generalization those weights can be 
used for subsequent decisions for future comparable groups. However, for many 
decisions, there is no completely adequate criterion measure against which to 
determine optimal, mathematically derived, predictor weights. 
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The Tennessee Career Ladder Teacher Evaluation System is a case in 
which both the issues of the cost/benefit ratio of gathering data and methods 
of combining data are relevant. To satisfy the legislative mandate for the 
career ladder program, the state needed to determine which teachers should be 
placed at which rungs of the career ladder. One of the basic principles of 
the system was that "multiple sources of data are essential to the development 
of a complete picture of teaching performance" (Tennessee Department of 
Education, 1986a, p. 2). In both 1985-86 and 1986-87 multiple data sources 
were combined using a combination compensatory/conjunctive approach. 

A variety of questions can be raised regarding the specific approaches 
used by Tennessee. For example, would their method be transportable to other 
states, or would it be considered too expensive? Could a judicious selection 
of a subset of the variables used result in a reasonably comparable set of 
candidates selected at a lower cost? Would different weighting procedures 
have resulted in much the same set of decisions? If not, what differences 
characterize the candidates selected under different systems? 

Many educators believe observation is the most valid way to evaluate 
teachers. In Tennessee, observational data included both low inference 
(frequency counts) and medium inference data (ratings). Are there any 
appropriate combinations of the observational v/ariables that would result in 
essentially the same decisions about teachers as would be made using the non- 
observational data? 
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OBJECTIVES 

The specific objectives of this paper were as follows: 

1. To determine for the 1985-1986 Tennessee data how well the less 
expensive -to -gather data could, when optimally weighted, 
predict the actual decisions made. 

2. To determine how well the decisions made without the classroom 
observation data would match the actual decisions made. 

3. To determine t:he relationship between the observation variables 
and all other pieces of data. 

4. To determine how similar the decisions would be under the 
compensatory and conjunctive weighting models. 

METHODOLOGY 

Instruments Used 

The 1985-86 Tennessee Career Ladder Evaluation System was based on the 
data from the following eight data sources: 

1) Classroom Observation 

Evaluatior^"^^""" Dialogues in Planning. Teaching Strategies, and 
3) Peer Questionnaire 
A) Principal Questionnaire 

5) Student Questionnaires (secondary, elementary, and primary levels) 

b) Tennessee Career Ladder Professional Skills Test 

7) Professional Development and Leadership Activities Summary 

0) Evaluator Consensus Judgment 

The Classroom Observations were based on the observations of a trained 
three member evaluation team. Each evaluator visited once. There were one 
announced and two unannounced visits. Twenty-five scores were obtained: two 
in the planning domain, 13 in the teaching strategies domain, one in 
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activities. These were jointly judged by two evaluators and two scores were 
produced: (1) enhances instruction with new techniques, and (2) exhibits 
leadership to improve schooling. 

The Evaluator Consensus Judgment Scores were obtained after the data 
collection phase. The evaluators m»^.t to discuss the candidates and provide 
consensus judgment scores on the first four domains: planning, teaching 
strategies, evaluation, and classroom management. Although the observation 
and dialogue scores were not available to the evaluators during consensus 
meetings, their personal notes from the school visits were. For this reason, 
the dialogue, observation and consensus data were not independent measures. 

All the data were combined in a set of complicated weighting and scaling 
procedures to provide five domain scores: Planning, Teaching Strategies, 
Evaluation, Classroom Management and Leadership. This was done by first 
combining all data except the principal and consensus scores by domain. These 
we call the data scores. The data scores were placed on a 200-800 scale (the 
principal and consensus scores were already on this scale), and the three sets 
were weighted and combined to form domain scores. While there were some 
differences across the domains, for domains 1-A the data sources were weighted 
about 65%, the principal ratings about 10% and the consensus scores about 25%. 
For domain 5, the data sources were rated 80% and the principal ratings 20%. 
Domain scores 1-5 were then weighted .15, .35, .15, .25, and .10, 
respectively, to obtain a total composite score. 

To be qualified for a Career Level above Level 1 the teachers had to meet 
or exceed a minimum scaled score of A50 in each domain and, in addition, have 
a total weighted composite score that met or exceeded the qualifying score for 
a particular level (600 for L^vel 2, and 700 for Level 3).l Details of the 
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instruments can be found in the Teacher Orientation Manual and details 
regarding the scoring can be found in the teacher's edition of the Career 
Ladder Technical Manual (Tennessee state Departnient of Education. 1986a & 
1986b). 

As can be discerned from the above abbreviated description, the Tennessee 
Career Ladder Evaluation System uses a multiple data source approach that can 
be both expensive to maj'^itain and difficult to explain, 

"The processes of aggregating scores, rescaling them, weighting and 
combining them, ... was confusing to almost everyone. Once the system 
was implemented. ., explaining the system to the candidates became a major 
concern." (McLarty, Furtwengler and Malo, 1985, p. 16). 

Nevertheless, while 

"the multiple data source system is difficult and expensive to build 
relatively inflexible, and complicated to explain it also provides a 
thorough and equitable evaluation, is relatively stable, and is based on 
concepts which are logically appealing" (p. 19). 

Although the Tennessee Career Ladder System remains somewhat 
controversial, the evaluation component has been perceived as producing 
reasonably valid results. Thus, for our purposes, we used the currenc 
classification as the criterion against which to compare other methods. 

Population/Sample 

Teachers completing evaluations for the 1985-86 cycle of the Career 
Ladder in Tennessee were used in the analyses. The 1985-86 cohort was chosen 
because there was more variability in the data than for the 86-87 cohort. All 
teachers who were in the General Education category were included. 
Adaptations of the evaluation system were made for special education teachers. 
Chapter 1 resource teachers, and vocational education teachers. Data for 
these individuals were not included in our analyses. 
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The population of r^^spondents was divided into two stratified random 
samples where the stratification was based on whether the teacher was a 
primary, elementary, or secondary teacher. One of the two groups wa£> used 
for the initial analyses and the other group was used for cross validation. 
Analyses 

The general analytic procedures were based on the four objectives listed 
earlier. First some preliminary descriptive analyses were completed to assist 
in the identification of meaningful ways to aggregate the data. Discriminant 
functions ware run to provide data relevant to objectives 1 and 2. Canonical 
correlation analyses were used for objective 3 and hit rates comparing a 
conjunctive (by domain) approach to the original combined 
conjunctive/compensatory approach were obtained to address objective 4. 



RESULTS 

Preliminary Results 

An initial set of descriptive analyses was complete I on the first half 
sample. This sample consisted of 535 teachers: 97 primary teachers, 128 
elementary teachers, and 310 secondary teachers. The breakdown by career 
ladder level was as follows: 68 Level I, 42 Level II, and 325 Level III 
Teacherij. 

The means, standard deviations, minimum scores and maximum scores \^ere 
computed for each variable. Although these results are too voluminous to 
present in this paper, they are useful in explaining why some variables 
counted for little in subsequent analyses (high means and little variance). 
For example, several observations (observations 6, 14, 19, 22, 23, and 27) had 
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100) and the principal mean ratings were 
Some of the more interesting descriptive 
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The seven observation variables which had extremely high means were as 
follows: Score 6: whether or not the teacher provided the students with 
correct information; Score 7: whether the teacher provided an appropriate 
language model for students; Score lA: the proportion of students in the class 
vho were "on task"; Score 19: an index of whether student behavior was under 
control; Score 22: whether the teacher treated students of different races and 
both genders equitably; Score 23: the amount of time required to resume class 
after an interruption (if one occurred) and; Score 27: whether the teacher was 
"on task." 

These scores, except for #19 and #23. focused on identifying serious but 
infrequent teacher errors. The score distribution for #19 resulted from the 
complex pattern scoring procedure. That for #23 was due to the small munber 
of situations in which a naturally-occuring interruption was observed. 

All five Principal subdoraain scores also had very high means (all above 
770) and their variances were quite low. The final combined domain score 
means were all well above the 600 minimum required for Career Ladder Level II, 
Three of these means approached and two exceeded the 700 required for Level 
III. The mean composite score of 690 was close to that required for Level 
III. Out of 535 individuals. 325 (61%) were identified as Level III teachers. 
(The distribution of composite scores was negatively skewed.) 
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Because of the large volume and considarable complexity of the data, as 
part of our initial exploration we obtained coi'i'elation matrices and a series 
of regression equations. Because the Tennessee Career Ladder data aie 
weighted and aggregated at multiple stages, it was necessary to determine at 
what level of aggregation to examine the data. Three levels were selertad: 
(1) the domain score lev^l (five scores, each incorporating data from most of 
the instruments); (2) the subtotal score level (fourteen scores: five based on 
all the data except the principal and consensus scores [hereafter called the 
data scores], five from the principal, and four from the evaluator consensu.-); 
and (3) the indicator level (61 scores- -23 from the observations, 10 from the 
dialogues, 1 from the peer questionnaire, 3 from the student questionnaires, 4 
from Che professional skills test, 2 from the professional leadership and 
development survey, 14 from the principal, and 4 from the evaluacor consensus 
judgment) . 

The domain score correlation matrix for the total sample of 535 cases is 
given in Table 2A. As can be seen, the correlations among the firsr. four 
domains were all above .64. Domain 5 (Leadership) sppeared to measure 
something a bit less related to the other domains. The correlations between 
the domains and the composite were, of course, related to the weights (.15, 
.35, .15, .25, and .10) of the domains in computing the composite. 

The subtotal domain score correlations are presented in Table 2B. Recall 
that the data scores includftd the observations and dialogues (as well as other 
variables). The observations and dialogues were conducted by the evaluators-- 
Mho also provided t .e consensus scores. Thus, we would expect the data and 
consensu, -j •-'^-.^in scores to be correlated. It is noteworthy that the five 
^^^^ - 'elate somewhat less with each other than the first four 
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data subscores correlate with the consensus subscores. In general the data in 
Table 2B suggest that the homo method- -hetro trait triangles for consensus and 
principal scores present higher correlations chan the hetro method- -homo 
trait correlations. These data strongly suggest the lack of convergent- 
discriminant validation of the traits (domains) (Campbell &'Fiske, 1959). 

The correlation matrix fox the 61 indicators is presented in Table 2C. 
Observation data were aggregated across occasions so occasion variance was 
treated as error. Therefore, many of the individual indicators based on 
observations, as expected, had low correlations with other variables. Indeed, 
the correlations among the observation indicators are quite low. However, the 
ten dialogue scores have fairly high correlations with each other as do the 
three student questionnaire scores, the four test scores, the two professional 
development and leadership scores, the fourteen principal scores, and the four 
consensus scores. Only the four consensus scores tended to correlate very 
much with the other scores. (Recall, however, the possible interaction of the 
observations, dialogue, and consensus scores.) Note that the consensus scores 
have low correlations with the four test scores (.02 to .23). They correlate 
moderately with the principal scores (.22 to .38). 



Insert Tables 2A through 2C About Here 



Regression equations using the backward stepwise deletion procedure were 
constructed to identify variables which might be deleted without significant 
loss of information. In each case, the composite score was treated as the 
dependent variable. When the five domain scc-es were treated as the 
independent variables, none was deleted. When the .mposite was regressed on 
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the 14 subtotal scores again no variables were dropped although the beta 
weights for the 5 'subdomain scores from principals were all quite small. 
However, when the composite was regressed onto \:he 61 indicators only 24 
scores remained in the equation. The multiple R for those 24 was 0.974. Of 
the 37 variables deleted, 17 were from the observations, 5 were dialogue 
scores, 1 was a student questionnaire score, 1 a test score, 1 a Professional 
Development and Leadership (PDL) score and 12 were principal questionnaire 
scores. In a reduced analysis using 43 independent variables (all the above 
61 except the 14 principal questionnaire scores and the 4 consensus judgment 
scores), only 23 remained in the equation after stepwise deletion. The 
multiple R was 0.95 and the 20 variables deleted were: 14 observation 
variables, 3 dialogue, 1 student, 1 test and 1 PDL. 

Separate analyses by grade level (primary, elementary, and secondary) 
showed much the same results on both the descriptive statistics and the 
various regression analyses. Thus we decided to complete all the remaining 
analyses using the combined data. 

Variables for inclusion in the discriminant function and canonical 
correlation analyses were selected based on the purposes of the study, the 
initial findings, and the costs of obtaining certain types of data. Further, 
hit rates were computed and a set of contingency tables were constructed to 
compare the actual outcome by levels with what would have been obtained using 
strictly a multiple cutoff (conjunctive) approach on the domain scores. 
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Discriminant Analyses 

One purpose of .the study (objective 1) was to determine how well a subset 
of data that was less expensive -to -gather could, when optimally weighted, 
predict the actual decisions made. A specific sub-purpose (objective 2) was 
to determine how well decisions made without expensive -to -gather observation 
data would match actual decisions. 

A series of discriminant function analyses were run based on 512 cases 
(the original 535 minus 23 cases which had at least one missing variable). 
Some variables were also dropped due to missing data. 

To provide "baseline" data the first discriminant analysis was run using 
all the iata. Had we used the combined conjunctive/compensatory model (450 
minimum on each domain; 600 and 700 composites for Level II and III) we would 
have had no misses. However, the discriminant function did not use the 
conjunctive component of the model. Further, the discriminant function 
assumes multivariate normality of the distributions and equal dispersion and 
covariance structures for the groups. Although the analyses is not very 
sensitive to violations of these assumptions we knew they were violated to a 
considerable degree. Therefore obtaining the baseline fit using all the data 
seemed useful. The prior probal ilities for the three levels were .13, .26, 
and .61. Thus, the maximum chance criterion hit rate was .61 and the 
proportional chance criterion was .46 (.132 ^ 26^ + .61^). 

The discriminant analysis classification using all the scores is 
presented in Table 3A. Note that 93.2% of the cases were correcrly 
classified. There were 4.7% false positives and 2.1% false negatives. 
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The data in Tables 3B through 3F indicate the hit rate for various 
deletions of independent variables: 3B omits the observation scores (90.8% 
hit rate); 3C omits the consensus scores (91.8%); 3D omits the dialogue scores 
(90.5%); 3E omits the observation and consensus scores (87.5%); and 3F omits 
the observation, dialogue and consensus scores (71.4%). 

Keeping in mind the baseline of 93.2% hits (cross validation evidence 
will come later), it would seem that omitting all observations and consensus 
scores and obtaining a hu.t rate of 87.5% (Table 3E) may be a reasonably cost- 
effective approach. Training the evaluators to observe and conducting the 
actual observations was very expensive. If judicious weighting of the 
remaining variables produces a hit rate close to the baseline hit rate using 
all the variables it may be cost effective to forgo the observation and 
consensus data obtained from the evaluators. 

If one wished to save even more resources (time and money) the dialogues 
also could be omitted. However, the drop to a hit rate of 71.4% (Table 3F) 
seems like too large a loss. The outside evaluators are probably needed to 
conduct the dialogues to keep the system accurate and acceptable to the 
public. Note particularly the 3 to 1 ratio of false positives to false 
negatives if all the data from the outside evaluators are dropped. 

In an effort to make some substantive interpretations of the findings for 
the data in Table 3E, we looked at the discriminant structure correlations 
(loadings). These are simply the linear correlations between the independent 
variables and the discriminant function and reflect the variance shared 
between the independent variables and the discriminant function. Although the 
loadings are subject to instability, they are considered more valid than the 
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discriminant weights as a maans of interpreting the discriminating power of 
the independent vari-ables (see Hair, Anderson, & Tatham, 1987, p. 91). 

The structure correlations (loadings) for the data in Table 3E are 
presented in Table A. These data indicate that the 10 dialogue scores 
counted the most in differentiating between Levels II and III. The 
principals' scores weighted heavily in differentiating oetween Levels I and 
II. 



Insert Table A About Here 



The data in Table 3F indicate clearly the loss of the dialogue scores in 
differentiating between Levels II and III. Note that there are 16 individuals 
actually in Level I who were classified in Level II, and 13 in Level I who 
were classified in Level III, for 29 errors. For Level II, 13 were classified 
in Level I and 81 in Level III. For Level III, 3 were classified in Level I 
and 22 in Level II. The major difference between the hit rates in 3E and 3F 
were the false positives in 3F due to placing many more actual Level II 
individuals into Level III (as mentioned earlier, this results in a 3 to 1 
false positive to false negative ratio). 

The discriminant structure correlations (not shown here) for the data in 
Table 3F (which omits observation, dialogue and consensus scores) indicated 
that the three student questionnaires counted the most in differentiating 
between Levels I and II. These are followed by the two Professional 
Development and Leadership Summary (PDL) scores. In addition, the lA 
principal scores, one peer score, and four test scores were all significant in 
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differentiating between the first two levels. None of the indicators was 
significant in differentiating between Levels II and III. 

Cross V alidated Discriminant Functions 

The cross validated discriminant function hit rates are shown in Tables 
5A through 5F. The original and cross validated hit rates are sununarized in 
Table 6. The baseline cross validated hit rate using all the variables was 
84.5%. 

The cross validated hit rate for two of the other analyses (omitting 
observations and omitting dialogue scores) exceeded the base line value! Of 
course one expects less shrinkage in a cross validation if fewer variables are 
used. Nevertheless, obtaining these values suggests that the original 
function using the observations (and perhaps the dialogues) capitalized on 
random error. 

The data presented in Table 6 suggest several possible options for making 
career ladder decisions with less data: (1) omitting only observations, (2) 
omitting only consensus scores, (3) omitting only dialogue scores, and (A) 
omitting both observations and consensus scores. The hit rate of 63% (25.5% 
false positives and 11.5% false negatives) suggests it may be unwise to 
eliminate all data from the external evaluators. More will be said about this 
in the discussion section 
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Canonical Correlations 

Some educators believe that the "bottom line" of teacher effectiveness is 
what actually goes on in the classroom and that external observations are 
needed to obtain such information. Thus, for these educators, the observation 
data represent the best measures for Career Ladder decisions. But, again, 
such data are costly to obtain. Perhaps other data, appropriately weighted, 
could serve as a surrogate sec (objective 3). 

Canonical correlation "measures the strength of the overall 
relationships between the linear composites of the predictor and criterion 
sets of variables" (Hair, Anderson, & Tatham, 1987, p. 187). Canonical 
correlations between the observations (dependent variables) and all the other 
data (independent variables) were obtained for 512 cases. (Twenty three of 
the original 535 were dropped due to missing data.) The results are 
summarized in Table 7. The first four canonical covariates produced Rs of 
.90, .64, .61, and .45 respectively (significant < .015). Note that these 
values reflect the variance explained in the linear composites, not the 
original variables. The percent of variance (redundancy index) of the 
dependent variables (observational data) explained by the non observation data 
was 23.8% by the first canonical varia.a and cumulated to 29.3% by the first 
four variates. For the independent variables the corresponding values were 
20.3% for the first root and 24.1% cumulative variance explained by the 
dependent (observation) variables. 
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While there are no generally accepted standards for the minimum 
acceptable redundancy index we believe the data explain too small a percent 
of the variance to advocate using one set of scores as a surrogate for the 
other set. Consequently, we did not cross -validate the canonical correlation 
equations . 



Conjun ctive vs. Compensatory Model g 

The final purpose of the study (objective 4) was to explore whether 
different decisions would have been made under a purely conjunctive m.odel. 
To estimate an optimal conjunctive solution an iterative procedure was used to 
vary the cut scores in various domains until the resulting contingency table 
provided a maximum number of hits. 

However, computer iterations are expensive and time consuming. Further, 
we know from data presented earlier, that one can eliminate some of the data 
sources and still make decisions highly comparable co the actual decisions 
made in Tennessee. Because fiscal viability in data gathering is essential 
regardless of the data combination model employed, we obtained hit rates 
separately for two components of the domain scores: data scores and principal 
scores. We also obtained hit rates for the total domain scores. This type of 
analysis is not reported for consensus scores because these scores are not 
procedurally independent from the dialogue and observation scores. For this 
analysis it was not feasible to eliminate any of the instruments which went 
into the data scores (such as observation or dialogues) because separate 
scores were not readily available for these as instruments. 

For each set, prior probabilities of the hit rate were obtained 
separately for each domain by setting the cut score between Career Ladder 
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Levels 2 and 3 equal to the score obtained by the person who ranked 325th out 
of the 535 cases (because 325 individuals actually obtained Level 3 status) 
and by setting the cut score between Levels 1 and 2 the score of the 469th 
person (tied scores changed both ranks considerably in some instances). Next 
maximum hit rates were obtained for each domain by iteratively varying the 
cutoff values. Then, the best combinations of cut scores for two domains and 
for three domains were obtained. We could have continued the iterations for 
the best combination of five domains. But given the intercorrelations of the 
domains (Table 2), it seemed unlikely that the cross-validiated hit rate would 
have been much higher. The more relevant data from the conjunctive analyses 
are presented in Tables 8A to 8C.^ 
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As can be seen from the data in Tables 8A to 8C, domains 1, 2, and 4 
seemed to provide the best hit rates (using the final decision as the 
criterion). Using only one domain score, domain 2 gave the best results for 
the data scores with an 81.2% hit rate. Domain 1 worked best for the 
principal scores and total domain scores providing hit rates of 63.0% and 
86.8%, respectively. The best combination of two domains for the data scores 
was domains 1 and 2 with a hit rate of 86.1%. For principal scores it was a 
tie between 1 & 4 and 2 & 4 with a hit rate of 63.3%. For domain scores it 
was 90.0%. The best combination of three domains was domains 1, 2, and 4 for 
all types of scores. For the data scores, the hit rate was 88.3%. For the 
principal scores the hit rate was 63.7%, and for the domain scores it was 
91.9%. 
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Given the weighting:- of the domain scores to obtain the composites (.15, 
.35, .15, .25, and .10 re!,pectively) it was not surprising that domains 2 and 
4 should give some of the better hit rate data. Given the intercorrelaCions 
of the domain subscores to each other and the compor,ite (Table 2B) one can 
understand why adding the second and third subdomain scores did not serve to 
increase the hit rate very much. As pointed out before, there seems to be 
convergence of the data across domain within method of data gathering. That 
is to say, the convergent/discriminant validity data support the importance of 
method and are dissonant with the hypothesis that the domains measure 
different constructs. This finding is consistent with results reported by 
Furtwengler and others (1985, 1986). 

We cross-validated the total domain score classifications with the 
conjunctive model using the iteratively derived optimum cut scores from the 
first sample. A contingency table showing the cross -validated hit rate.<; is 
given in Table 9. Note that using only three domains the cross-validated hit 
rate was 86.3% (down from 91.9% in the first sample). That was higher than 
the cross-validated discriminant function using all the data (Table 5A) . 



Ir!.;ert Table 9 About Here 



DISCUSSION 

Objectives 1 & ?. 

Because external evaluators comprised the major expense of the 
evaluations in the career ladder program, we analyzed the data ./ith various 
portions of their tasks omitted to determine how the decisions might differ. 
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The evaluators gathered the observation and dialogue data on the same school 
visit. Training was needed for both these tasks. However it is difficult to 
separate out the costs of the two pieces of data. Further, conducting the 
dialogues could conceivably have had some impact on the observation scores and 
vice versa. The consensus data were gathered by bringing the evaluators 
together at the close of the overall evaluation process. This was a time 
consuming and expensive process. it required extensive training time and 
travel money. Because the consensus ratings were completed after the 
observations and dialogues the consensus discussions could not have impacted 
the .other two sets of data. The evaluators' perceptions from the 
observations and dialogues were used in reaching the consensus. 

The results of the original and cross validated discriminant analyses 
suggested several possibilities regarding making career ladder decisions with 
greater fiscal viability. 

(a) Dropping observations results in a cross validated hit rate of 
85.7% which exceeded the baseline cro.^.s validated hit rate (84.5%) with all 
scores included. Thus, dropping observations would be wise if such actions 
would not result in less valid dialogue or consensus scores. This would 
result in considerable savings. 

Any decision regarding the elimination of observations must consider the 
existing philosophy about the value of observations in career ladder decisions 
(or any other personnel decision) and the psychometric characteristics of the 
observation scores. Fourteen states now mandate observations for 

certification (Sandefur. 1986) and individuals supporting such mandates would 
surely hesitate to give them up. Others believe observations are invalid 
and/or ethically and legally improper (Macmillan & Pendlebury. 1985; Scriven; 
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1987). Current research suggests that extensive observations are necessary to 
provide observation, scores of sufficient reliability for use in personnel 
decisions (see, for example, Capie and Ellett, 1987). 

(b) Dropping consensus scores provides a cross validated hit rate of 
81.2% compared with the base rate cross validated hit rate of 84.5%. This 
would result in considerable savings, but less savings than dropping the 
observations. However, it would not impact any of the other data. 

(c) Dropping dialogue scores results in a cross validated hit rate of 
84.6% which exceeds the cross validated hit rate using all variables. Again, 
if such an action did not reduce the validity of observation or consensus 
scores it would be a cost effective move. 

(d) Omittin^^ obsei-vacion and consensus scores results in a cross 
validated hit rate of 76.6%. There are 3.7% more (of the total) false 
positives and 1.2% more false negatives using this approach than are obtained 
in the cross validation using all the variables. Observations and consensus 
scores are the two most expensive pieces of data to gather. State agencies 
should consider whether the benefit (assuming all variables as weighted give 
us something closer to the "Truth") outweighs the cost. 

(e) With omission of all the data provided by the external evaluators 
(observations, dialogues, and consensus) the cross validated hit rate was 63%, 
there were 25.5% false positives and 11.5% false negatives. While it is 
possible the principals would have been more rigorous had there not been 
outside evaluators, the opposite seems more likely. The finding here 
certainly is in keeping with the results from other states. When an outside 
evaluator is not used the resulting distribution of scores is so negatively 
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skewed as Co provide almost useless data. It would probably not be reasonable 
to choose this option. 

Objective 3 

Professionals differ strongly with respect to the value, and importance, 
of ob.servations . Although dropping the observations from the discriminant 
function does not negatively impact the cross validated hit rate, the 
reasonably low canonica^ correlations (and redundancy indices) suggest that 
the observations can not be used as a surrogate for the other data — or vice 
versa. 

Initially it may seem the discriminant functions and the canonical 
correlations produced contradictory results. How can dropping a reasonably 
independent variable (as the canonical correlations suggest) not lower the 
hit rate in a discriminant function? At least one explanation is that the 
variation in observation scores was largely occasion variance. Whatever the 
variances in the observation variables indicate it is obvious that using onlv 
the observation data would result in basing the career ladder decision on 
something quite different than that indicated by the total set of data used in 
Tennessee, 

Objective 4 

Because of the way Tennessee obtained, scored, and aggregated data it was 
not feasible to compare the multiple cutoff (conjunctive) and compensatory 
approaches by instrument. However, using the conjunctive approach at the 
domain level indicate; that a reasonably high hit rate can be obtained by 
using only data scores from three domains (88.3%). The total domain scores. 
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again using only r.hree domains, produced a hit rate of 91.9%. Cross-validated 
this became 86.3%. . This was higher than the cross-validated discriminant 
function hit rate. Although w do net have a totally congruent comparison 
between conjunctive and compensatory models, the data we do have do not 
support the use of a compensatory over a conjunctive model. 



SUMMARY AND CONCLUSIONS 
The Tennessee Career Ladder Evaluatior, System was an extremely ! .gh 
stakes evaluation. Considerable controversy existed within the state over 
whether there should be a career ladder system and whether adequate 
evaluations could be performed. It was essential that any developed and 
implemented system be based on concepts which were logically appealing using 
data collection methods which allowed for objective scoring processes. It was 
necessary given the political/educational climate Co build a multiple data 
source system. 

However, after the fact, it is useful to examine the system to determine 
whether a less costly and/or more easily explainable approach could serve in 
reaching essenti My the same conclusions. In addition, a close look at the 
data furthers our understandings about the educational constructs (domains) 
m.asured and our ability to measure them. The results of our analyses should 
he useful in any revisions of the Tennessee system, to orher states or 
districts planning a system, and indeed to theorists in educational personnel 
evaluation. 

The descriptive analyses indicated that several indicators were not 
useful for making differential career ladder decisions. For example, six 
observation variables had means greater than 99 (100 point maximum), and the 
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principal mean ratings were very high (all domain means above 768 on a 200-800 
scale). This descriptive information alone would be useful for any evaluation 
system revision. 

The correlation matrices of the domain scores (Table 2A) and the domain 
sub scores (Table 2B) indicate that the data are more impacted by method than 
trait, suggesting a lack of construct validity. Both practioneers and 
theorists need to consider the implications of this for future personnel 
evaluation. 

The discriminant function analyses and the related structure correlation 
matrices indicated that omitting both the expensive- to -gather observation and 
consensus scores still allowed a cross -validated hit rate of 76.6% when the 
acfial career ladder placement served as the criterion measure. With those 
scores omitted, the principals • scores were useful in differentiating between 
Career Ladder Levels 1 and 2, but not between Levels 2 and 3. The dialogue 
scores were important for differentiation at the higher level. 

Although observation data could be omitted i.rom the discriminant function 
without reducing the cross -validated hit rate, canonical analysis indicated 
that only 29.3% of the observation data variance was explained by the other 
data and that only 24.1% of the variance of the other data was explained by 
the observation data. A reasonable conclusion is that the variance for the 
observation data contained a large portion of occasion variance treated as 
error. 

The conjunctive model data indicated that with an optimal cut score, one 
can reproduce the actual decisions made in Tennessee with fair accuracy (91.9% 
hit rate, cross-validated to 86.3%) using the domain scores for only thrse 
domains. Alternatively, the data scores alone for these three domains, 
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eliminating pricipal and consensus scores, could be used. This gave an 
original hit rate of 88.3%, which is somewhat lower, but would result in 
considerable savings. 

All of these conclusions lead us to the following suggestions to consider 
in building future career-ladder evaluation systems: 

1. Aggregate and report data by instrument rather than by domain. 

2. Carefully evaluate the cost-effectiveness of any observation 
system. 

3. Eliminate principal evaluations. 

4. Use a conjunctive model to combine the data. 

The above suggestions are given in decreasing oi/der regarding their 
empirical support from our study. Certainly following one suggestion may 
impact the value of following another. For example, aggregating and reporting 
data by instrument rather than domain may greatly impact the decision 
regarding the choice of a data combination model. 

Finally, we recognize all of these suggestions must be considered in 
light of the political and educational milieus and the fiscal viability of any 
evaluation system. 
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Footnotes 

In addition, in order to make Level I, all applicants were required to 
pass a reading/writing test. Only two individuals in the sample failed 
on this criterion. Those who failed to pass generally dropped out of the 
evaluation process. 

Note also that there were some optimum cut scores that included a range 
of values. We arbitrarily chose to use the lowest value that resulted in 
maximum accuracy. Using a different value in the range would change the 
ratio of false positives and false negatives. 
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Table 1 

Descriptive Data for Some of the Scores from the Tennessee 
Evaluation Instrument (N-535) 



Obseiryation Score Number 
6 
7 
14 
19 
22 
23 
27 

Domain # Principal 



Mean 


St. 


Dev. 


Min. 


Max. 


Possible Ranee 


99.82 


1 


.04 


83.7 


100 


0-100 


94.95 


7 


.60 


50.0 


100 


0-100 


99.98 




04 


99.7 


100 


0-100 


99.64 




67 


94.1 


100 


0-100 


99.97 




72 


83.3 


100 


0-100 


99.90 




13 


99.4 


100 


0-100^ 


99.56 


1. 


44 


83.7 


100 


0-100 



1 


Planning 


778 


35.4 


525 


800 


200-800 


2 


Teaching 
Strategy 


777 


.7 


460 


800 

\J \J \J 




3 


Evaluation 


772 


38.7 


550 


800 


200-800 


4 


Classroom 
Management 


775 


46.0 


350 


800 


200-800 


5 


Leadership 
Consensus 


772 


38.9 


550 


800 


200-800 


1 


Planning 


681 


104 


300 


800 


200-800 


2 


Teaching 
Strategy 


667 


112 


200 


800 


200-800 


3 


Evaluation 


662 


112 


200 


800 


200-800 


4 


Classroom 
Mangaement 


670 


122 


200 


800 


200-800 


Final 


Domain Score 












1 
2 
3 
4 
5 




668 
679 
692 
701 
732 


105 
85 
75 
78 
52 


358 
350 
373 
340 
451 


793 
782 
788 
796 
795 


200-800 
200-800 
200-800 
200-800 
200-800 


Composite Score 


690 


71 


376 


780 


200-800 



N = 37 for this case. 
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Table 2 A 





Correlation Matrix of the Domain Scores 


(N=535) 






Domain 


•L ^ J 


4 


5 


Composite 


1 Planning 


1 .738 .646 


669 


.443 


.847 


2 Teaching 
Strategy 


1 .727 


806 


.431 


.947 


3 Eval. 




769 


.377 


.842 


4 Classroom 
Management 






.369 


.906 


5 Leadership 








.510 



qo 

o 

ERIC 



Table 2B 

Correlation Matrix of Domain Subscores 



FISCAL VIABILITY 33 



1. 


Data - P 




2. 


Data - TS 




3. 


Data - E 




4. 


Data - CM 




5. 


Data - L 




6. 


Consensus - 


P 


7. 


Consensus - 


TS 


8. 


Consensus - 


E 


9. 


Consensus - 


CM 


10. 


Prin - P 




11. 


Prin - TS 




12. 


Prin - E 




13. 


Prin - CM 




14. 


Prin - L 




15. 


Composite 





234567891 10 

.59 .45 .51 .39 .71 .66 .64 .57 27 

- .58 .71 .38 .68 .67 .63 .63 .33 

- .68 .29 .55 .55 .65 .53 .26 

- .34 .64 .62 .61 .64 .27 

- .38 .34 .35 .25 .32 

- .87 .86 .78 .35 

- .85 .78 .34 

- .78 .36 

- .31 



1 1 

J. J. 


1 9 


1 '5 
i J 


1 A 

14 


15 










. 78 


.35 


.26 


.30 


.39 


.89 


.24 


.19 


.23 


.24 


.73 


.30 


.21 


.29 


.22 


.83 


.31 


.28 


.24 


.29 


.48 


.36 


.27 


.33 


.31 


.87 


.34 


.25 


.32 


.28 


.86 


.38 


.28 


.35 


.32 


.84 


.37 


.24 


.37 


.26 


.81 


.87 


.83 


.75 


.78 


.41 




.82 


.89 


.76 


.43 






.73 


.75 


.33 








.67 


.41 










.37 



P ° Planning 

TS - Teaching Strategy 

E « Evaluation 

CM - Classroom Management 

L « Leadership 
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Table 2C-1 

Correlation Matrix of the 61 Indicators 
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Observation 



10 11 12 13 14 15 



17 18 19 21 24 25 26 



27 



.59 



:^^o :ll To -M :tl -3 
-M -M -3 -M 
:ll :l} lit 
il :I§ 



.29 



^6 'ill 
-.05 



•11 
i\ 

.20 



16 

■}'7 -11 -11 

".'8^ -:Jo -ioi 



'■1% -M .. 



49 
14 



.29 



.54 



.34 
.43 

•3 



i\ 
il 

I 

.47 



:8? 

i\ 
:?] 
\ll 

.15 
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. . Tabic 2C-2 

Correlation Matrix Of The 61 Indicators 



Observation 



27 

Dialogue 



11 

Z Peer 
Z Study 

I 

3 

ZPST 



PDL 



h 



:?5 



23 



.17 



Dialogue^ 
6 *" 



it 
It 



2i; 



?2 
It 



J 

16 



.32 .33 .35 .35 

•M :ll :ll -M 

.80 .^^ .g^ 

.82 



Si 



.34 
.34 
.34 



It 

1} 



Vr 

- 

18 



54 .31 



Peer 





•25 
.» 


IQ 
02 


-IQ 

.03 - 


Q5 
07 


.12 
.03 


21 
11 


.17 

1 


22 
02 




11 

06 . 


:os - 


14 
02 


'21 
.21 


14 


.28 


12 
13 


.24 
22 


25 


32 
.58 


16 
07 


.32 
OS - 




13 


28 , 


20 


P : 


\l 


\\ : 


25 
26 


W : 


W 


I i 






24 




16 



_Study_ 



M 
M 



M 

14 
22 

1 

5 
1 

I 
? 

8 
16 



.80 



W 
8? 
l\ 
\% 
\% 

]9 

\l 
M 
W 
W 

14 
21 

W 
II 
\l 
\% 

27 
17 



rsT 

"Z —3" 



13 



1% 



01 -.00 



81 
l\ 

16 

I 



06 



PDL 



10 



1% 



17 



\% 

22 



.05 .08 *04 .11 .19 .20 



.05 -.02 -.08 



.01 



6 .19 



.47 



il -W -M 



.45 



.72 
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. . TABLE 2C-3 

Correlation Hatnx of the 61 Indicators 



Observation 



1? 

Oiotogue 
I 

I 

5 
6 

I 
1? 

Z Peer 

Z Study 

I 
3 

ZPST. 



PDL 



PI 



il 

15 



CO»S 



1} 

1 



i :ll 

.29 .29 

.15 .is 

.12 .12 

.15 .lA 



^6 




IT" 


.^5 








•^% 
















:?i ■ 








\\ 


•M 


'i 

14 


\ 


■f ' 


18 






-.n - 


•§J 




•J ■ 


•§? 




'•t 


•§l 


:il 




•11 




:]t 




1% 




•M 




l\ 


J? 


•M . 






M 


:ll : 






\l 


-M : 




I : 


\l 


■3 : 




W : 


\^ 


•M : 


W : 


W : 




.30 .3A .28 .23 



PI 



.a4 



:1I 

.15 
12 

18 

17 



79 



•M 

.18 
.09 

.15 

•18 
.20 

.75 

•M 

.80 



J§ :o^ :oi 

0^ !o^ 
t -11 :JJ 



.10 
.15 



.7A 



.19 
10 
lA 



•W -M -W 

.U .17 .17 

1 ...10 .09 

10 .09 .10 



\% :1I 

.76 .61 

:a 

•}) -n 
16^ :8§ 

.63 .^0 



68 



.58 



82 



IT 
M 









•\% 








•M 


•w 


•w 




•3 


■\% 














.29 


.25 


.23 


.25 



15 
20 

8f 

05 

60 

7A 
81 

91 



IT 
:?8 



?^ 

12 
05 

W 
81 



15 
II 
II 
^8 

.30 
.16 



.08 
.12 



COMS 



.67 

•M 

.66 
.60 
.A7 
.51 
.71 
.67 



.2o !ii r?7 

:?2 
:ll 

-M -M :?t 
M -.Il 
:l} :ii :ii 
•il J§ 
•M 

.A3 .A5 .A7 
.AD .A2 .37 

:ll :ll :ll 
•M :% •M 



53 

it 
?f 
St 

i\ 

50 

^§ 



It ' 



.A9 .A5 
.A9 .AS 

•il " 



.28 .28 

•M •.Il 

.36 .3A 



•5A 
.67 

•62 
.62 

.27 



•M 

.A9 
.50 

.19 



:ll •M 

.35 .3A 



.33 
.10 



.3A 

'il 

:i2 

•25 
.2A 



.27 
.28 
.30 



35 

?1 

11 

1§ 

32 



27 
87 



35 
09 



.3A 
.02 



35 



0 
8 
0 
6 
85 



.25 

Jo' 



•22 

.2/1 
•53 
.38 
.27 
.25 
•23 
.78 
.78 
.78 



Co<npg^ i te 

:^ 
:l'r 
:f9 
•M 
•.1} 
•3 

.^9 

•M 
:t] 

•i^ 

.3A 



.3A 
.50 
.A9 



.51 
.16 

•M 

.AO 

•M 
•M 



.87 
.86 
.8A 
.81 
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Table 3 A 

Predicted Career Ladder Level Using Discriminant Function Analyses With All Data 



Actual 



Group 


1 


2 


3 


Total 






1 


55 


10 


0 


65 


Canonical 




2 


3 


119 


14 


136 


Correlation 


P 


3 


0 


8 


303 


311 


.902 
.446 


.000 
.000 


Total 


58 


137 


317 


512 







Percent of cases correctly classified 93.2% 
Percent of false positives 4,7 
Percent of false negatives 2 1 



Table 3B 

Predicted Career Ladder Level Using Discriminant Function Analysis & 

Omitting Observation Scores 



Actual 
Group 


Predicted 
1 


Group Membership 
2 3 


Total 






1 


55 


10 


0 


65 


Canonical 




2 


6 


114 


16 


136 


Correlation 


P 


3 


0 


15 


296 


311 


.892 


.000 


Total 


61 


139 


312 


512 


.351 


.003 



Percent of cases correctly classified 90.8% 
Percent of false positives 5,1 
Percent of false negatives 4.1 
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Table 3C 

Predicted Career Ladder Level Using Discriminant Function Analyses & 

Omitting Consensus Scores 



Ac tual 
Group 


1 


Predicted 
2 


Group Membership 

3 Total 






1 


56 


9 


0 


65 


Canonical 




2 


4 


119 


13 


136 


Correlation 


P 


3 


0 


16 


295 


311 


.883 


.000 


Total 


60 


144 


308 


512 


.410 


.002 



Percent of cases correctly classified 91.8% 
Percent of false positives 4.3 
Percent of false negatives 3,9 



Table 3D 

Predicted Career Ladder Level Using Discriminant Function Analyses & 

Omitting Dialogue Scores 



Actual 
Group 




1 


Predicted 
2 


Group Membership 

3 Total 






1 

2 




54 
7 


11 

109 


0 
21 


65 

137 


Canonical 
Correlation 


P 


3 




0 


10 


305 


315 


.886 


.000 


Total 




61 


130 


326 


517 


.414 


.000 


Percent 
Percent 
Percent 


of cases correctly 
of false positives 
of false negatives 


classified 


90.5% 
6.2 
3.3 
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Table 3E 



Predicted Career Ladder Level Using Discriminant Function Analyses & 
Omitting Observation & Consensus Scores 



Actual 
Group 

1 

2 
3 

Total 



Predicted Group Membership 



1 


z 


3 


Total 






54 


10 


1 


65 






9 


105 


22 


136 


Canonical 




0 


22 






Correlation 


P 


289 


311 






63 








.844 


.000 


137 


312 


512 












.283 


.158 


cases 


correctly 


classified 


87.5% 






false 


positives 




6.4 






false 


negatives 




6.1 







Table 3F 

Predicted Career ladder Level Using Discriminant Function Analyses & 
Omitting Observation, Dialogue & Consensus Scores 

Actual Predicted Group Membership 

^^^^P 1 2 3 Total 



1 


36 


16 


13 


65 


Canonical 




2 


13 


43 


81 


137 


Correlation 


P 


3 


3 


22 


290 


315 


.663 


.000 


Total 


52 


81 


384 


517 


.207 


.532 



Percent of cases correctly classified 71.4% 
Percent of false positives 21.3 
Percent of false negatives « 7.4 



41 



FISCAL VIABILITY 40 



Table 4 

Structure Correlation Matrix for Analysis 3E 



Dialogue 

5 
7 
6 
4 
2 
1 
3 
8 
9 
11 



Function 1 ** 

.51 * 
.50 * 
.49 * 
.48 * 
.48 * 
.48 * 
.40 * 
.40 * 
.38 * 
.37 * 



Function 2 

-.09 
-.20 
-.18 
-.10 
-.22 
-.17 
-.13 
-.18 
-.29 
-.07 



Student 

2 
1 

PDC 

2 
1 

Peer 



.34 * 
.32 * 



.26 * 
.26 * 



.15 
.29 



.07 
.03 



Prin 

15 
13 
9 

Pre Skills Tesr 
1 

Prin 
14 

Pro Skills T.^c-r 
3 



.20 * 



.19 * 
.17 * 
.16 * 



.16 * 



.15 * 



.08 * 



.06 



.04 
.16 
.10 



.09 



.11 



,00 



ERIC 
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Prin 

12 

5 
11 

7 

Student 
3 

Prin 

1 

3 
4 
2 
6 

Pro Skills Tpc;^ 
4 

Prin 
18 

Pr^_Jkills_Test 



Function 1 ** 

.20 
.20 
.18 
.20 



.34 



.21 

.20 
.22 
.20 
.20 



.12 



.15 



Function 2 

.44 * 
.41 * 
.40 * 
.37 * 



.37 * 



.33 * 
.32 * 
.32 * 
.31 * 
.25 * 



.21 * 



.16 * 



.12 



.13 * 



** Function 1 primarily differentiates Levels II and III whereas function 2 
primarily differentiates Levels I and II. 

* Indicates the loading adds significantly to the discrimant function at the 05 
ie 'el of statistical significance. 
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Table 5 A 

Cross Validated Discriminant Function Hit Rate Using All Data 



1 


2 


3 


Total 


41 


20 


3 


64 


7 


118 


25 


150 


0 


24 


272 


296 


48 


162 


300 


510 



Percent of cases correctly classified 84.5% 

Percent of false positives 9*4^ 

Percent of false negatives 6 1% 

Max. Chance Hit Rate 58*0% 



Table 5B 

Cross Validated Discriminant Function Hit Rate Omit-ing Observation S 



cores 



Actual Predicted Group Membership 

^"^P 1 2 3 Total 

1 ^5 16 3 64 

2 6 122 22 150 

3 0 26 270 296 
Total 51 164 295 510 

Percent of cases correctly classified 85.7% 
Percent of false positives 3*0% 
Percent of false negatives 6*3^ 
Max Chance Hit Rate 58*0% 



ERLC 
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Table 5C 

Cross Validated Discriminant Function Hit Rate Omitting Consensus Scores 



Actual 



Group 


1 


2 


3 


Total 


1 


34 


27 


3 


64 


2 


8 


113 


29 


150 


3 


0 


29 


267 


296 


Total 


42 


169 


299 


510 



Percent of cases correctly classifed 81.2% 

Percent of false positives 11.6% 

Percent of false negatives 7.3% 

Max. Chance Hit Rate 57.8% 



Table 5D 

Cross Validated Discriminant Function Hit Rate Omitting Dialogue Sco 



res 



Actual 
Group 


1 


Predicted 
2 


Group Membership 

3 Total 


1 


46 


19 


1 


66 


2 


9 


111 


^1 


151 


3 


0 


19 


278 


297 


Total 


55 


149 


310 


514 



Percent of cases correctly classified 
Percent of false positives 
Percent of false negatives 
Max. Chance Hit P',ce 



84.6% 
9.9% 
5.5% 

57.8% 



ERIC 
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Table 5E 

Cross Validated Discriminant Function Hit Rate 
Omitting Observations and Consensus Scores 



Actual 
Group 


1 


Predicted 

0 


Group Membership 
3 ■ 


Total 


1 


36 


25 


3 


64 


2 


6 


105 


39 


150 


3 


0 


31 


265 


296 


Total 


42 


161 


307 


510 



Percent of cases correctly classified 
Percent of false positives 
Percenc of false negatives 
Max. Chance Hit Rate 



76.6% 
13.1% 
7.3% 
58.0% 



Table 5F 

Cross Validated Discriminant Function Hit Rate 
Omitting Observations, Dialogue & Consensus Scores 



Predicted Group Membership 



Ac tual 
Group 

1 

2 
3 

Total 



Percent of cases correctly classified 
Percent of false positives 
Percent of false negatives 
Max. Chance Hit Rate 



1 


2 


3 


26 


18 


22 


22 


3C 


91 


5 


32 


260 


53 


88 


373 



Total 

66 
151 
297 

514 



63.0% 
25.5% 
11.5% 
57.8% 



ERIC 
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Table 6 

Original and Cross Validated Discriminant Functions 



All Variables 

Original 

Cross -Validated 
Omitting Observations 

Original 

Cross -Validated 
Omitting Consensus Scores 

Original 

Cross -Validated 
Omitting Dialogue Scores 

Original 

Cross -Validated 
Omitting Observ. & Consensus 

Original 

Cross -Validated 
Omitting Tt-s, Dialogue, & Consensus 

Original 

Cross -Validated 



Hit Percent Percent 

Rate False Pos. False Neg. 



93.2 4.7 2.1 

84.5 9.4 6.1 

90.8 5.1 4.1 

85.7 8.0 6.3 

91.8 4.3 3.9 
81.2 11.6 7.3 

90.5 6.2 3.3 

84.6 9.9 5.5 

87.5 6.4 6.1 

76.6 13.1 7.3 

71.4 21.3 7.4 

63.0 25.5 11.5 



ERIC 
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Table 7 

Summary Results of the Canonical Correlation Analysis (N=512) 



Percent Ys Percent Xs 

canonical Explained by Xs Explained by Ys 

Variates Canonical R p (Redund. .-.cy Index) (Redundancy Index) 



1 


.90 


.000 


23.8 


20.3 


2 


.64 


.000 


1.7 


1.4 


3 


.61 


.000 


2.9 


2.0 


4 


.45 


.015 


.9 
29.3 


.4 
24.1 
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Table 8A 










Classification Using the Conjunctive Model: 


Data Scores 




Priors 


Cut Score 

L/\ju}a^ki i VS £, 


Cut Score 

Z VS j 


XT 

F. Neg 


Percent Classified 


Planning 


1 441 


702 


13.3 


73.7 


13.0 


Teaching 
Strategy 


2 566 


707 


10.3 


79.3 


10.3 


Evaluation 


3 614 


710 


16.0 


67.7 


16.4 


Classroom 
Management 


4 615 


717 


12.8 


72.9 


13.7 


Leadership 


5 707 


728 


19.0 


62.0 


19.0 


Cutoff-Best 
Single Domain 


2 573 


702 


5.6 


81.2 


13.2 


Cutoff -Best 












Combination 
of Two 


1&2 363,573 


583,690 


7.3 


86.1 


6.6 


Domains 












Cutoff.'^est 
Combination 


1,2,4 360,444,600 


570,690,702 


6.8 


88.3 


4.9 


of Three 












Domains 
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Table 8B 

Classification. Using The Conjunctive Model: Principal Scores 



Priors 


Domain 


Cut Score Cut Score 
1 vs. 2 2 vc;. ^ 


Percent Classified 
F. Neer Ht r T? d^^. 


Planning 


1 


735 


781 


19.9 


57.3 


22.7 


Teaching 
Strategy 


2 


742 


781 


20.3 


58. 3 


9^ 4 

^ ^ . *-r 




3 


/ LD 


"7 crk 
/ 50 


15.4 


59.0 


25.6 


Evaluation 


4 


732 


778 


91 1 


J J . 0 


/.J . 3 


Classroom 


5 


719 


750 


18.1 


57.1 


24.8 


Cutoff- Best 
Single Domain 


1 


7n/i 


■7 O 

720 


2 . 3 


63.0 


34.8 


Cutoff- Best 
Combination 


1&4 


704,660 


716,720 


3.8 


63.3 


32.9 


of Two 














Domains 














Tied With 


1&2 


704,700 


716,710 




63.3 


33.8 


Cutoff-Best 
Combination 


1,2,4 


704,700,660 


728,706,720 


4.3 


63. 7 


^2.0 


of Three 














Domains 














Tied With 


1,4,5 


7C ,660,600 


728,725,710 


4.9 


63.7 


31.4 
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Table 8C 

Classification Using the Conjunctive Model: Total Domain Scores 



„ . Score Cut Score Percent Classified 

Domain 1 vs 2 2 vs 3 F. Neg. Hit p, Pos. 



Leadership 5 712 737 19.5 



Cutoff- Best 
Combina 
of Two 
Domains 



Planning 1 5II 700 11.7 77.1 11.3 

Teaching 2 573 695 7.1 85 7 7 1 

Strategy- 
Evaluation 3 615 705 10.9 78.2 10.9 

Classroom 4 611 716 8.3 83.5 8 3 
Management 



61.3 19.2 



Cutoff-Best 2 550 693 4.5 86.8 8 6 

Single Domain 



Combination 1&2 465,550 598,686 4.7 90 0 5 3 

of Two 



Cutoff-Best 1,2,4 465,500,591 620,678,700 6.2 91 9 19 

Combination 
of Three 
Domains 
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TABLE 9 

s Validated Hit Rate Using the Conjunccive Model With Three Domain Scores* 



Actual 

Decis^.on Conjunctive Decision (Level) Row 

(^^^^i-) I II III Total 



II 



III 



Column 
Total 



50 
9.5 



22 
4.2 



72 
13.7 



15 

2.9 



126 
24.0 

24 
4.6 

165 
31.4 



1 

.2 



10 
1.9 

278 
52.9 

289 
54.9 



66 
12.5 



158 
30.0 

302 
57.4 

526 
100.0 



* The three domains were Planning, Teaching Strategies, and Classroom Managemenc. The 
first number is the number of individuals in the cell. The second number is the percent 
of individual in the cell. 

Percent of cases correctly classified 86.3% 
Percent of false positives 4.9% 
Percent of false negatives 8.8% 



